Complex real-life data sets in Grid simulations

نویسندگان

Dalibor Klusáček

Hana Rudová

چکیده

Widely used workload sources such as Parallel Workloads Archive (PWA) [4] or Grid Workloads Archive (GWA) [5] often provide data sets that are still unrealistic. Typically, very limited information is available about the grid/cluster parameters such as architecture, speed, RAM size or resource specific policies. Moreover, no information concerning background load, resource failures, or specific user's requests are available. In heterogeneous environments, users often specify a small subset of machines that are suitable to perform the jobs. This subset is usually defined either by resource owner's policy (user is allowed to use such cluster) or by user's requirements (user specifies some property offered by some clusters only) or by both factors. When one tries to create a good scheduling algorithm and compare it with current approaches such as PBSpro [3], all such information and constraints are crucial, since they make the algorithm design much more complex. So far, we have been able to collect complex real-life data set from the Czech national Grid infrastructure MetaCentrum [6] that covers many previously mentioned issues, representing 5 months of execution, involving 103 620 jobs completed on 14 heterogeneous clusters having 806 CPUs. This data set includes exact machine parameters involving both hardware setup (speed, architecture, RAM size) and supported properties (e.g., hardware/institution/queue based restrictions). For each job, a queue where the job was originally submitted is known, as well as the maximal runtime after which the job would be killed. Also, the numbers, types, time limits and priorities of all queues are known. Moreover, machine failures and restarts are known together with the list of temporary dedicated, thus unavailable, machines. In contrast to the GWA or PWA sources, our data set allows to perform far more realistic simulations involving all precedent features. We have studied behavior of several objective functions in our solution that cover typical requirements such as average job slowdown, response time, waiting time and the algorithm’s runtime. We use schedule-based algorithms involving Local Search (sched-LS) which we have been developing for couple of years [1], as well as widely used queue-based solutions such as FCFS, EASY Backfilling [2] or PBSpro algorithms [3] representing multiple-queues priority based backfilling. In our experiments, we have focused on two scenarios. SIMPLE scenario does not simulate dedicated resources or failures. Moreover, all jobs can be executed on any cluster (if enough CPUs are available), thus SIMPLE represents the typical amount of information available in the GWA or PWA data sets. On the other hand, COMPLEX scenario uses every additional information available in our data set such as queue priorities, machine failures, as well as additional machine and job properties that define the job-to-machine suitability. As observed during the experiments, the difference between SIMPLE and COMPLEX setup is dramatic as shown in the figure depicting the average waiting time (in log. scale). In case of SIMPLE, the differences between algorithms are quite small while the COMPLEX data set introduces huge differences among algorithms, causing that previously acceptable FCFS or EASY backfilling now degrade dramatically. Similar behavior was visible for all above mentioned objective functions. It is clear that complex and “rich” data set influences the algorithms’ performance and causes significant differences in the values of objective functions. We suggest that—beside the PWA and the GWA—complex data sets should be also used to evaluate existing and newly proposed algorithms under harder conditions. Therefore, at the time of the conference our data set will be publicly available for further open research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پایش سلامت در حیات مصنوعی در حوزه های بیماری های واگیر و آلودگی

Introductions :Inherent human society and its complexities, in addition the possibility of trial in more social and health sciences, caused using of computerized simulations and artificial models in terms of necessary views that are exist in these types of sciences. to design these artificial models based on real environment and evaluation and recognition and then recalling the results, and usi...

متن کامل

Lattice numerical simulations of hydraulic fractures interacting with oblique natural interfaces

The hydraulic fracturing propagation is strongly influenced by the existence of natural fractures. This is a very important factor in hydraulic fracturing operations in unconventional reservoirs. Various studies have been done to consider the effect of different parameters such as stress anisotropy, toughness, angle of approach and fluid properties on interaction mechanisms including crossing, ...

متن کامل

Leveraging Complex Event Processing for Grid Monitoring

Currently existing monitoring services for Grid infrastructures typically collect information from local agents and store it as data sets in global repositories. However, for some scenarios querying real-time streams of monitoring information would be extremely useful. In this paper, we evaluate Complex Event Processing technologies applied to real-time Grid monitoring. We present a monitoring ...

متن کامل

Cascade-based attack vulnerability on the US power grid

The vulnerability of real-life networks subject to intentional attacks has been one of the outstanding challenges in the study of the network safety. Applying the real data of the US power grid, we compare the effects of two different attacks for the network robustness against cascading failures, i.e., removal by either the descending or ascending orders of the loads. Adopting the initial load ...

متن کامل

Construction of Hexahedral Block Topology and its Decomposition to Generate Initial Tetrahedral Grids for Aerodynamic Applications

Making an initial tetrahedral grid for complex geometry can be a tedious and time consuming task. This paper describes a novel procedure for generation of starting tetrahedral cells using hexahedral block topology. Hexahedral blocks are arranged around an aerodynamic body to form a flow domain. Each of the hexahedral blocks is then decomposed into six tetrahedral elements to obtain an initial t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Complex real-life data sets in Grid simulations

نویسندگان

چکیده

منابع مشابه

پایش سلامت در حیات مصنوعی در حوزه های بیماری های واگیر و آلودگی

Lattice numerical simulations of hydraulic fractures interacting with oblique natural interfaces

Leveraging Complex Event Processing for Grid Monitoring

Cascade-based attack vulnerability on the US power grid

Construction of Hexahedral Block Topology and its Decomposition to Generate Initial Tetrahedral Grids for Aerodynamic Applications

عنوان ژورنال:

اشتراک گذاری